Robust N-gram Based Syntactic Analysis Using Segmentation Words

نویسندگان

  • Nobuo Inui
  • Yoshiyuki Kotani
چکیده

We describe an N-gram based syntactic analysis using a dependency grammar. Instead of generalizing syntactic rules, N-gram information of parts of speech is used to segment a sequence of words into two clauses. A special part of speech, called segmentation word, which corresponds to the beginning or end symbol of clauses is introduced to express a sentence structure. Segmentation words for each clause were learned using the hill climbing method and a small bracketed corpus. Experimental results for Japanese sentences showed that N-gram based syntactic parser achieved 72.2% recall, which is about the same level of performance as a probabilistic context-free grammar based parser with human-made language-dependent information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Potato Color Image Segmentation using Adaptive Fuzzy Inference System

Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...

متن کامل

Language Model Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech

Language modeling has many applications in a large variety of domains. Performance of this model depends on its adaptation to a particular style of data. Accordingly, adaptation methods endeavour to apply syntactic and semantic characteristics of the language for language modeling. The previous adaptation methods such as family of Dirichlet class language model (DCLM) extract class of history w...

متن کامل

Which is More Suitable for Chinese Word Segmentation , the Generative Model or the Discriminative One ? F ∗

Since the traditional word-based n-gram model, a generative approach, cannot handle those out-of-vocabulary (OOV) words in the testing-set, the character-based discriminative approach has been widely adopted recently. However, this discriminative model, though is more robust to OOV words, fails to deliver satisfactory performance for those in-vocabulary (IV) words that have been observed before...

متن کامل

A Portable And Quick Japanese Parser: QJP

QJP is a portable and quick softwaxe module for Japanese processing. QJP analyzes a Japanese sentence into segmented morphemes/words with tags and a syntactic bunsetsu kakari-uke structure based on the two strategies, a) Morphological analysis based on character-types and functional-words and b) Syntactic analysis by simple treatment of structural ambiguities and ignoring semantic information. ...

متن کامل

Integration of morphological and syntactic analysis based on LR parsing algorithm

Morphological analysis of Japanese is very different from that of English, because no spaces are placed between words. The analysis includes segmentation of words. However, ambiguities in segmentation is not always resolved only with morphological information. This paper proposes a method to integrate the morphological and syntactic analysis based on LR parsing algorithm. An LR table derived fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001